A Note on Local Ultrametricity in Text

نویسنده

  • Fionn Murtagh
چکیده

High dimensional, sparsely populated data spaces have been characterized in terms of ultrametric topology. This implies that there are natural, not necessarily unique, tree or hierarchy structures de ned by the ultrametric topology. In this note we study the extent of local ultrametric topology in texts, with the aim of nding unique \ ngerprints" for a text or corpus, discriminating between texts from di erent domains, and opening up the possibility of exploiting hierarchical structures in the data. We use coherent and meaningful collections of over 1000 texts, comprising over 1.3 million words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultrametricity in Data: Identifying and Exploiting Local and Global Hierarchical Structure

We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals.

متن کامل

Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)

Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...

متن کامل

From Data to the Physics Using Ultrametrics: New Results in High Dimensional Data Analysis

We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, in time series signals, and in other areas. We conc...

متن کامل

Identifying and Exploiting Ultrametricity

We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. In our presentation we ...

متن کامل

Ultrametric Model of Mind, II: Application to Text Content Analysis

In a companion paper, Murtagh (2012), we discussed howMatte Blanco’s work linked the unrepressed unconscious (in the human) to symmetric logic and thought processes. We showed how ultrametric topology provides a most useful representational and computational framework for this. Now we look at the extent to which we can find ultrametricity in text. We use coherent and meaningful collections of n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0701181  شماره 

صفحات  -

تاریخ انتشار 2005